14 research outputs found

    Incremental file reorganization schemes

    Get PDF
    Issued as Final project report, Project no. G-36-66

    Design and implementation of a time warp parallel database system

    No full text
    Issued as Report, Project C-36-68

    Efficiency and Security Trade-Off in Supporting Range Queries on Encrypted Databases

    No full text
    The database-as-a-service (DAS) model is a newly emerging computing paradigm, where the DBMS functions are outsourced. It is desirable to store data on database servers in encrypted form to reduce security and privacy risks since the server may not be fully trusted. But this usually implies that one has to sacrifice functionality and efficiency for security. Several approaches have been proposed in recent literature for efficiently supporting queries on encrypted databases. These approaches differ from each other in how the index of attribute values is created. Random one-to-one mapping and order-preserving are two examples. In this paper we will adapt a prefix- preserving encryption scheme to create the index. Certainly, all these approaches look for a convenient trade-off between efficiency and security. In this paper we will discuss the security issues and efficiency of these approaches for supporting range queries on encrypted numeric data

    Image Mining: A New Approach for Data Mining

    Get PDF
    We introduce a new focus for data mining, which is concerned with knowledge discovery in image databases. We expect all aspects of data mining to be relevant to image mining but in this first work we concentrate on the problem of finding associations. To that end, we present a data mining algorithm to find association rules in 2-dimensional color images. The algorithm has four major steps: feature extraction, object identification, auxiliary image creation and object mining. Our algorithm is general in that it does not rely on any type of domain knowledge. A synthetic image set containing geometric shapes was generated to test our initial algorithm implementation. Our experimental results show that image mining is feasible. We also suggest several directions for future work in this area

    The Sensible Sharing Approach to a Scalable, High-Performance Database System

    No full text
    Exploiting parallelism has become the key to building high-performance database systems. Several approaches to building database systems that support both inter and intra-query parallelism have been proposed. These approaches can be broadly classified as either Shared Nothing (SN) or Shared Everything (SE). Although the SN approach is highly scalable, it requires complex data partitioning and tuning to achieve good performance whereas the SE approach suffers from non-scalability. We propose a sensible sharing approach which combines the advantages of both SN and SE. We propose an architecture, and data partitioning and scheduling strategies that promote sensible sharing. We analyze the performance and scalability of our approach and compare with that of a SN system. We find that for a variety of workloads and data skew our approach performs and scales at least as well as a SN system that uses the best possible data partitioning strategy

    Avoiding Conflicts between Reads and Writes Using Dynamic Versioning

    No full text
    In this paper, we discuss a new approach to multi-version concurrency control, called Dynamic Versioning, that avoids the data contention due to conflicts between Reads and Writes. A data item is allowed to have several committed versions and at most one uncommitted version. A conflict between a Read and a Write is resolved by imposing an order between the requesting transactions, and allowing the Read to access one of the committed versions. The space overhead is reduced to the minimum possible by making the versions dynamic; a version exists only as long as it may be accessed by an active transaction. Conditional lock compatibilities are used for providing serializable access to the multiple versions. The results from simulation studies indicate that the dynamic versioning method, with little space overhead (about 1\% the size of the database), significantly reduces blocking (by 60\% to 90\%) compared to single-version two-phase locking. Lower blocking rates increase transaction throughput and reduce variance in transaction response times by better utilization of resources. This approach also reduces starvation of short transactions and subsumes previous methods proposed for supporting long-running queries. The dynamic versioning method can be easily incorporated into existing DBMS systems. The modifications required for the lock manager and the storage manager modules to implement dynamic versioning are discussed

    Shadow Logging - An Efficient Transaction Recovery Method

    No full text
    In this paper, we present LU-Logging, an efficient transaction recovery method. The method is based on (flexible-redo/minimal-undo) algorithm. The paper describes an implementation which avoids the overheads of deferred updating used in previous no-undo implementations. An update by a transaction to a data record does not immediately update the data record. Instead, it generates a redo log record and associates it with the data page. Each page in the data base has an associated log page, which contains the still-uncommitted log records of the updates for the data page. The log page is read from and written to the disk along with the corresponding data page. This gives the flexibility of applying the redo log records any time after the transaction commits, in particular when the data page is read by another transaction. We call this updating as lazy. For aborted transactions the redo log records are just discarded. Simulation studies show that the overhead during normal transaction processing for LU-Logging is comparable to that of traditional logging. The crash recovery time is shown to be an order of magnitude faster than that for traditional logging

    Evolution in Data Streams

    No full text
    Conventional data mining deals with static data stored on disk, for example, using the current state of a data warehouse. In addition, the data may be read muliple times to accomplish the mining task. Recently, the data stream paradigm has become the focus of study, where data is continuously arriving as a sequence of elements and the data mining task has to be done in a single pass. An example is to construct a model(s) of the data as in clusitering or classification in a single pass and with limited memory. Data arrives as one of multiple potentially infinite streams under the data stream model. Data streams can flow at variable rates and the underlying models often change with time. The current work in data stream mining does not focus on change ("evolution") and that is precisely our main focus. Monitoring the changes in the models becomes as important as objeaining the models. Therefore, stream data mining not only needs to mine data incrementally and decrementally (in order to keep track of recent data), but also has to provide methods to monitor/detect the changes of underlying modesl. We consider this problem as "data evolution." Of equal importance, the mining algorithms themselves need to be adaptive/dynamic when the flow rate of data streams change dramatically. That is, the algorithms should be able to downgrade accuracy in order to handle a data burst, or to do a more thorough analysis when data flow is slow. We consider this problem as "algorithm evolution." We will study both data evolution and algorithm evolution. We will provide efficient algorithms to incrementally/decrementally mine stream data, good techniques to store data models and detect/monitor the changes, and a set of algorithms that can switch from "high resolution" to "low resolution" in order to adapt to the flow rate

    An Efficient Algorithm for Mining Association Rules in Large Databases

    No full text
    Mining for association rules between items in a large database of sales transactions has been described as an important database mining problem. In this paper we present an efficient algorithm for mining association rules that is fundamentally different from known algorithms. Compared to the previous algorithms, our algorithm reduces both CPU and I/O overheads. In our experimental study it was found that for large databases, the CPU overhead was reduced by as much as a factor of seven and I/O was reduced by almost an order of magnitude. Hence this algorithm is especially suitable for very large size databases. The algorithm is also ideally suited for parallelization. We have performed extensive experiments and compared the performance of the algorithm with one of the best existing algorithms

    Adaptive and Automated Index Selection in RDBMS

    No full text
    We present a novel approach for a tool that assists the database administrator in designing an index configuration for a relational database system. A new methodology for collecting usage statistics at run time is developed which lets the optimizer estimate query execution costs for alternative index configurations. Defining the workload specification required by existing index design tools may be very complex for a large integrated database system. Our tool automatically derives the workload statistics. These statistics are then used to efficiently compute an index configuration. Execution of a prototype of the tool against a sample database demonstrates that the proposed index configuration is reasonably close to the optimum for test query sets
    corecore